Overview

Dataset statistics

Number of variables14
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory523.1 KiB
Average record size in memory535.7 B

Variable types

NUM6
CAT5
BOOL3

Reproduction

Analysis started2020-06-20 14:58:31.051214
Analysis finished2020-06-20 14:58:43.011418
Duration11.96 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
TrainingTimesLastYear has 34 (3.4%) zeros Zeros

Variables

Attrition
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
0
843
1
 
157
ValueCountFrequency (%) 
0 843 84.3%
 
1 157 15.7%
 

BusinessTravel
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Travel_Rarely
709
Travel_Frequently
199
Non-Travel
 
92
ValueCountFrequency (%) 
Travel_Rarely 709 70.9%
 
Travel_Frequently 199 19.9%
 
Non-Travel 92 9.2%
 

Length

Max length17
Mean length13.52
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 11 64.7%
 
Uppercase_Letter 4 23.5%
 
Dash_Punctuation 1 5.9%
 
Connector_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Latin 15 88.2%
 
Common 2 11.8%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

DistanceFromHome
Real number (ℝ≥0)

Distinct count29
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.145
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median7
Q313
95-th percentile26
Maximum29
Range28
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.120955912
Coefficient of variation (CV)0.8880214229
Kurtosis-0.1030362226
Mean9.145
Median Absolute Deviation (MAD)5
Skewness1.008336875
Sum9145
Variance65.94992492
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 143 14.3%
 
1 142 14.2%
 
9 65 6.5%
 
7 62 6.2%
 
10 58 5.8%
 
3 52 5.2%
 
8 52 5.2%
 
4 45 4.5%
 
5 44 4.4%
 
6 44 4.4%
 
Other values (19) 293 29.3%
 
ValueCountFrequency (%) 
1 142 14.2%
 
2 143 14.3%
 
3 52 5.2%
 
4 45 4.5%
 
5 44 4.4%
 
ValueCountFrequency (%) 
29 21 2.1%
 
28 17 1.7%
 
27 9 0.9%
 
26 18 1.8%
 
25 17 1.7%
 
Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
4
308
3
304
2
196
1
192
ValueCountFrequency (%) 
4 308 30.8%
 
3 304 30.4%
 
2 196 19.6%
 
1 192 19.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

JobRole
Categorical

Distinct count9
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Sales Executive
217
Research Scientist
209
Laboratory Technician
166
Manufacturing Director
97
Healthcare Representative
90
Other values (4)
221
ValueCountFrequency (%) 
Sales Executive 217 21.7%
 
Research Scientist 209 20.9%
 
Laboratory Technician 166 16.6%
 
Manufacturing Director 97 9.7%
 
Healthcare Representative 90 9.0%
 
Manager 74 7.4%
 
Sales Representative 64 6.4%
 
Research Director 47 4.7%
 
Human Resources 36 3.6%
 

Length

Max length25
Mean length18.024
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 20 69.0%
 
Uppercase_Letter 8 27.6%
 
Space_Separator 1 3.4%
 
ValueCountFrequency (%) 
Latin 28 96.6%
 
Common 1 3.4%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

JobSatisfaction
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
3
321
4
306
1
188
2
185
ValueCountFrequency (%) 
3 321 32.1%
 
4 306 30.6%
 
1 188 18.8%
 
2 185 18.5%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

MaritalStatus
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Married
469
Single
314
Divorced
217
ValueCountFrequency (%) 
Married 469 46.9%
 
Single 314 31.4%
 
Divorced 217 21.7%
 

Length

Max length8
Mean length6.903
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 11 78.6%
 
Uppercase_Letter 3 21.4%
 
ValueCountFrequency (%) 
Latin 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

MonthlyIncome
Real number (ℝ≥0)

Distinct count941
Unique (%)94.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6464.418
Minimum1009
Maximum19999
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1009
5-th percentile2120.7
Q12874
median4877.5
Q38393
95-th percentile17660.3
Maximum19999
Range18990
Interquartile range (IQR)5519

Descriptive statistics

Standard deviation4685.919516
Coefficient of variation (CV)0.7248787927
Kurtosis1.031928028
Mean6464.418
Median Absolute Deviation (MAD)2174.5
Skewness1.374173983
Sum6464418
Variance21957841.71
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5562 3 0.3%
 
2380 3 0.3%
 
2342 3 0.3%
 
2404 3 0.3%
 
2610 3 0.3%
 
2451 3 0.3%
 
3452 2 0.2%
 
17861 2 0.2%
 
2269 2 0.2%
 
2720 2 0.2%
 
Other values (931) 974 97.4%
 
ValueCountFrequency (%) 
1009 1 0.1%
 
1051 1 0.1%
 
1052 1 0.1%
 
1081 1 0.1%
 
1118 1 0.1%
 
ValueCountFrequency (%) 
19999 1 0.1%
 
19973 1 0.1%
 
19926 1 0.1%
 
19859 1 0.1%
 
19847 1 0.1%
 

OverTime
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
No
716
Yes
284
ValueCountFrequency (%) 
No 716 71.6%
 
Yes 284 28.4%
 

TrainingTimesLastYear
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.841
Minimum0
Maximum6
Zeros34
Zeros (%)3.4%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.300542352
Coefficient of variation (CV)0.4577762592
Kurtosis0.4313947547
Mean2.841
Median Absolute Deviation (MAD)1
Skewness0.5676822032
Sum2841
Variance1.69141041
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 362 36.2%
 
3 346 34.6%
 
5 89 8.9%
 
4 75 7.5%
 
6 48 4.8%
 
1 46 4.6%
 
0 34 3.4%
 
ValueCountFrequency (%) 
0 34 3.4%
 
1 46 4.6%
 
2 362 36.2%
 
3 346 34.6%
 
4 75 7.5%
 
ValueCountFrequency (%) 
6 48 4.8%
 
5 89 8.9%
 
4 75 7.5%
 
3 346 34.6%
 
2 362 36.2%
 

CommunicationSkill
Real number (ℝ≥0)

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.041
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.413972531
Coefficient of variation (CV)0.4649695926
Kurtosis-1.296248835
Mean3.041
Median Absolute Deviation (MAD)1
Skewness-0.0428380009
Sum3041
Variance1.999318318
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5 207 20.7%
 
4 206 20.6%
 
3 201 20.1%
 
2 193 19.3%
 
1 193 19.3%
 
ValueCountFrequency (%) 
1 193 19.3%
 
2 193 19.3%
 
3 201 20.1%
 
4 206 20.6%
 
5 207 20.7%
 
ValueCountFrequency (%) 
5 207 20.7%
 
4 206 20.6%
 
3 201 20.1%
 
2 193 19.3%
 
1 193 19.3%
 

OwnStocks
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
1
572
0
428
ValueCountFrequency (%) 
1 572 57.2%
 
0 428 42.8%
 

PropWorkLife
Real number (ℝ≥0)

Distinct count346
Unique (%)34.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.28684550215494325
Minimum0.0
Maximum0.6727272727272727
Zeros7
Zeros (%)0.7%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0.04751082251
Q10.1794871795
median0.2631578947
Q30.4
95-th percentile0.5661672216
Maximum0.6727272727
Range0.6727272727
Interquartile range (IQR)0.2205128205

Descriptive statistics

Standard deviation0.1538261788
Coefficient of variation (CV)0.5362684012
Kurtosis-0.5050614235
Mean0.2868455022
Median Absolute Deviation (MAD)0.09465881685
Skewness0.43071975
Sum286.8455022
Variance0.0236624933
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3333333333 22 2.2%
 
0.2 21 2.1%
 
0.5 19 1.9%
 
0.25 18 1.8%
 
0.2222222222 17 1.7%
 
0.2857142857 16 1.6%
 
0.3076923077 13 1.3%
 
0.3225806452 13 1.3%
 
0.1666666667 12 1.2%
 
0.4 12 1.2%
 
Other values (336) 837 83.7%
 
ValueCountFrequency (%) 
0 7 0.7%
 
0.01960784314 1 0.1%
 
0.02222222222 1 0.1%
 
0.02631578947 1 0.1%
 
0.02857142857 6 0.6%
 
ValueCountFrequency (%) 
0.6727272727 2 0.2%
 
0.6666666667 1 0.1%
 
0.6607142857 1 0.1%
 
0.6603773585 1 0.1%
 
0.6551724138 1 0.1%
 

PropExpComp
Real number (ℝ≥0)

Distinct count154
Unique (%)15.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.143472619047619
Minimum0.0
Maximum38.0
Zeros7
Zeros (%)0.7%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0.5
Q11.6
median3
Q35
95-th percentile10.5
Maximum38
Range38
Interquartile range (IQR)3.4

Descriptive statistics

Standard deviation4.063484827
Coefficient of variation (CV)0.9806954699
Kurtosis17.54727075
Mean4.143472619
Median Absolute Deviation (MAD)1.8
Skewness3.270087499
Sum4143.472619
Variance16.51190894
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5 81 8.1%
 
3 59 5.9%
 
0.5 57 5.7%
 
2 54 5.4%
 
1 47 4.7%
 
4 47 4.7%
 
2.5 42 4.2%
 
6 35 3.5%
 
10 29 2.9%
 
4.5 26 2.6%
 
Other values (144) 523 52.3%
 
ValueCountFrequency (%) 
0 7 0.7%
 
0.3 2 0.2%
 
0.375 1 0.1%
 
0.4 3 0.3%
 
0.4285714286 1 0.1%
 
ValueCountFrequency (%) 
38 1 0.1%
 
37 1 0.1%
 
34 2 0.2%
 
28 1 0.1%
 
23 1 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

AttritionBusinessTravelDistanceFromHomeEnvironmentSatisfactionJobRoleJobSatisfactionMaritalStatusMonthlyIncomeOverTimeTrainingTimesLastYearCommunicationSkillOwnStocksPropWorkLifePropExpComp
00Non-Travel23Laboratory Technician4Single2564No2400.40000012.000000
10Travel_Rarely123Manufacturing Director3Married4663Yes2210.1944440.700000
21Travel_Rarely23Sales Executive4Single5160No3500.2181822.400000
30Travel_Rarely241Research Scientist4Single4108No2400.4615382.250000
40Travel_Rarely33Manufacturing Director3Married9434No2110.2702705.000000
50Travel_Rarely72Sales Representative3Married2329No2200.4193553.250000
61Travel_Rarely14Laboratory Technician3Single3730Yes2100.1250004.000000
70Travel_Rarely41Laboratory Technician2Married3838No5500.2424240.888889
80Travel_Frequently114Sales Executive4Divorced4968No3410.1428572.500000
91Travel_Rarely72Sales Representative2Single2679No3500.0476190.500000

Last rows

AttritionBusinessTravelDistanceFromHomeEnvironmentSatisfactionJobRoleJobSatisfactionMaritalStatusMonthlyIncomeOverTimeTrainingTimesLastYearCommunicationSkillOwnStocksPropWorkLifePropExpComp
9900Travel_Frequently61Laboratory Technician1Married5562Yes3510.2500002.250000
9910Travel_Frequently104Research Scientist3Divorced3815Yes4110.1470592.500000
9920Travel_Rarely13Healthcare Representative4Divorced9613No5310.48717919.000000
9931Travel_Frequently93Sales Executive3Married12936No3300.5319153.125000
9940Travel_Rarely73Healthcare Representative1Married9985No1110.2325581.111111
9950Non-Travel102Sales Executive4Single9980No3400.2777785.000000
9960Travel_Rarely163Manufacturing Director4Single7945Yes2200.4500002.571429
9971Travel_Rarely93Sales Executive4Single9619No3400.1956524.500000
9980Travel_Rarely23Manufacturing Director4Single6877Yes4500.4000002.000000
9990Travel_Frequently23Sales Executive2Married7525No2310.56603810.000000